A FPGA Embedded DSP Supporting Parallel Multiple Low Bit-Width Multiply-Accumulate Operations
نویسندگان
چکیده
With the continuous development of big data and hardware computing platforms, deep learning has been substantially applied in many intelligent scenarios. Recent studies have shown that using low bit-width networks inference can effectively improve overall performance accelerator by reducing computational ability requirements while maintaining recognition accuracy accelerator. Among them, convolutional operations such as 8bit 4bit are widely used applications graph recognition. FPGA chip is core key device digital system, due to excellent reconfigurability FPGA, it become one mainstream platforms field The current FPGAs composed higher multipliers need adapt different application requirements, DSP module resources perform operations, which only occupy part multiplier bit-width, thus wasting a large amount on resources. Therefore, this paper proposes architecture compute multiplications parallel, so new realize double multiply-accumulate without adding multipliers, support any combination signed unsigned operations. design based commercial Stratix IV architecture, circuit designed with SMIC 14nm standard CMOS process. experimental results show when calculating same number 4-bit 8-bit resource consumption area improved reduced 43.5% speed increased 48%.
منابع مشابه
Low Power Multiply Accumulate Unit (MAC) for DSP Applications
Wireless Sensor Network (WSN) presents significant challenges for the application of distributed signal processing and distributed control. These systems will challenge us to apply appropriate techniques to construct capable processing units with sensing nodes considering energy constraints. Digital Signal Processing (DSP) is one of the capable processing units, but it is not commonly used in W...
متن کاملA DSP-Enhanced 32-Bit Embedded Microprocessor
EISC (Extendable Instruction Set Computer) is a compressed code architecture developed for embedded applications. In this paper, we propose a DSP-enhanced embedded microprocessor based on the 32-bit EISC architecture. We present how we could exploit the special features, and how we could overcome the deficits, of the EISC architecture to accelerate DSP applications with a relatively low hardwar...
متن کاملReduced Redundant Arithmetic Applied on Low Power Multiply-Accumulate Units
We propose a new redundant approach on designing multiply-accumulate units for low power. State of the art implementations make use of redundant registers to obtain low delay times by moving any carry propagate adder out of the operation cycle. Our contribution is optimizing the level of redundancy by adjusting the size of the carry register. This optimization is performed by a VHDL generator, ...
متن کاملDevelopment of a large word-width high-speed asynchronous multiply and accumulate unit
This paper details the design of the fastest known asynchronous Multiply and Accumulate unit (MAC) architecture published to date. The MAC architecture herein is based on the MAC developed in Smith et al. (J. Syst. Archit. 47/12 (2002) 977–998). However, the MAC developed in Smith et al. (2002) contains conditional rounding, scaling, and saturation (CRSS) logic, not present in other comparable ...
متن کاملA New Interval Approximation Supporting Bit Operations and Byte Access
In this paper we present a new variant of the commonly used interval approximation based on the so-called valid interval approach. This new approach supports arithmetic and bit operations including shift and rotate functions. Furthermore, it allows read and write access to the byte representation of integer and floating point values. We present the necessary functions for integer intervals in d...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Advances in transdisciplinary engineering
سال: 2023
ISSN: ['2352-751X', '2352-7528']
DOI: https://doi.org/10.3233/atde230112